Characterizing short germline-specific promoters with a range of expression levels in C. elegans

A core tenet of synthetic biology is that well-characterized regulatory elements are essential for engineering biological systems. Here, we characterize the specificity and expression levels of 18 short (254 to 880 bp) candidate germline promoters using a single-copy gfp reporter assay in C. elegans . Six promoters resulted in ubiquitous expression, three did not drive detectable expression, and nine were germline-specific. Several promoters drove stronger germline expression than the commonly-used mex-5 promoter. The promoters range across expression levels and facilitate, for example, low expression of toxic transgenes or high expression of gene editing enzymes, and their compactness facilitates gene synthesis.


Figure 1. Functional Characterization of Germline-Specific Promoters in C. elegans:
(A) Properties of selected candidate promoters and summary of results. RNA-seq data from Serizay et al. (2020) is reported in Transcript Per Million (TPM). We calculated the relative germline expression as a fraction of expression in all major tissues (germline, neurons, intestines, hypodermis, and muscles). We selected promoter regions from the start codon of the candidate gene to the start or stop codon of the adjacent protein-coding gene. Homopolymer stretches of 10 bp or more and BsaI type IIS restriction sites in the selected promoter regions were modified by the introduction of base substitutions to facilitate synthesis and Golden Gate assembly. (B) Expression patterns of the ubiquitous eef-1A.1p (alternative name eft-3p) and germline mex-5p controls. (C-E) We classified germline-specific candidate promoters by careful visual inspection for somatic expression on a fluorescence microscope at high magnification (40x, oil objective). Due to very high GFP expression in the germline, somatic expression is not easily visible in all images. (C) GFP expression patterns of promoters that showed both somatic and germline fluorescence: his-61p, and Y37E3.8p. (D) Promoters with no detectable GFP expression: puf-11p, F23A7.8p, and clec-87p. (E) Promoters with germline-specific expression: klp-19p, his-68p, W05F2.3p, spn-4p, pos-1p, Y75B12B.1p, puf-5p, mei-2p, and mesp-1p. Images were taken using a 20x air objective. Scale bars = 20 μm. (F) Top: Visual quantification of germline-specific GFP expression by scoring transgenic animals (blinded to genotype) from 0 (no expression) to 4 (high expression) on a fluorescence dissection microscope. Neg. Ctrl. refers to non-transgenic N2 animals. Bars

Description
The precise manipulation of biological systems requires a versatile synthetic biology toolkit of regulatory elements. Libraries of standardized genetic "parts" have permitted control over complex metabolic pathways to produce valuable chemicals or introduce novel traits in various biological systems, including bacteria, yeast, and plants (Choi et al., 2019). Germline promoters are particularly useful for heritable genome editing, as well as the study and manipulation of germline processes. Precise control over transgene expression levels is advantageous, as protein function and toxicity are generally dosagedependent. For example, in C. elegans, overexpression of the microtubule force regulator GPR-1 by codon adaptation was able to change Mendelian inheritance by forcing premature cell division in the early embryo (Redemann et al., 2011;Besseling & Bringmann, 2016). As an alternative approach to modulate transgene expression, Artiles et al. (2019) generated random gpr-1::gfp insertions and relied on position effect variegation to isolate viable lines with "Goldilocks" expression: stable and exactly enough expression to generate a high frequency of non-Mendelian inheritance but no obvious toxicity. Moreover, while two germline promoters (mex-5p and pie-1p) are frequently used in C. elegans, ubiquitous promoters with high germline expression (eef-1A.1p (prior nomenclature eft-3p), smu-1p, and smu-2p) are often used for efficient genome editing (Frøkjaer-Jensen et al., 2012;Aljohani et al., 2020). These examples highlight the importance of controlling transgene expression to engineer biological systems and generate desired outcomes. We reasoned that advances in tissue-specific sequencing provide new opportunities for the rational identification of regulatory components with particular characteristics. Here, we characterize short putative germline promoters across a range of expression levels in C. elegans.
We identified candidate promoters using tissue-specific RNA-seq data from Serizay et al. (2020) and selected short promoters (less than 1 kb) from genes with high absolute and relative germline expression ( Figure 1A). To facilitate synthesis and cloning, we modified homopolymers of 10 bp or more and BsaI recognition sites using single nucleotide substitutions. We cloned candidates and controls (ubiquitous eef-1A.1p and germline mex-5p) into vectors containing a codon-optimized gfp with nuclear localization signals and a germline permissive 3' UTR (tbb-2) using Golden Gate Assembly (Merritt et al., 2008;Engler et al., 2009;Fielmich et al., 2018). We tested the expression of each transgene from single-copy insertions into a germline-permissive safe-harbor landing site on chromosome II (8.24 Mb) using MosTI (El Mouridi et al., 2022).
We verified that the constructs and insertion strategy produce ubiquitous and germline gfp expression using the standard eef-1A.1p and mex-5p promoters ( Figure 1B). We then assessed the tissue-specificity and expression patterns of each candidate promoter using fluorescence microscopy. We observed GFP expression in somatic cells in six candidates and Y37E3.8p) in addition to the germline ( Figure 1C). Expression in the soma from his-61p, his-64p, and hil-4p was limited compared to the broad expression found using rla-0p, rpl-7Ap, and Y37E3.8p. The specificity of these promoters reflects the relative germline expression measurements from RNA-seq data ( Figure 1A). Although multiple candidates are expressed in the soma and the germline, promoters such as his-61p (327 bp) and his-64p (254 bp) provide shorter alternatives to eef-1A.1p (579 bp).
We observed no detectable GFP expression from puf-11p, F23A7.8p, and clec-87p reporter constructs (Figure 1D), despite the high germline RNA expression for clec-87 (2,352 TPM) and the highest fractional germline expression for F23A7.8p (72%) ( Figure 1A). The discrepancy with RNA-seq data could be due to regulation in the native genomic context outside the selected promoter region. ATAC-seq data show potential germline enhancers for clec-87 and puf-11 near their 3' UTR (Serizay et al., 2020), suggesting possible distal regulation of germline expression. However, no clear open chromatin peaks are reported at the F23A7.8 locus. F23A7.8 is the only candidate gene selected from chromosome X, which is mostly silenced in the germline (Kelly et al., 2002). These results highlight the need to experimentally test putative regulatory elements, and these promoters could potentially serve as minimal promoters to screen for distal cis-regulatory elements.
Nine promoters (klp-19p, his-68p, W05F2.3p, spn-4p, pos-1p, Y75B12B.1p, puf-5p, mei-2p, and mesp-1p) produced GFP expression that was specific to the germline ( Figure 1E). We quantified GFP expression levels using two assays: a blinded visual screen under a fluorescence dissection microscope and using a COPAS flow cytometer ( Figure 1F). The relative expression levels generally correlated across the two quantification methods, with the exception of mei-2p and mex-5p. The quantification revealed a range of expression levels in the germline. Notably, klp-19p and his-68p consistently produced GFP that is brighter than mex-5p. These promoters provide promising options to enhance gene editing efficiency and limit somatic background with high and specific expression levels in the germline. puf-5p, pos-1p, and Y75B12B.1p offer medium expression levels, and mei-2p and mesp-1p offer relatively low expression levels. These promoters are well-suited for experiments that require modest protein levels or to reduce transgene toxicity in the germline. The relatively short length of these promoters makes them practical for gene synthesis. Therefore, we have incorporated the sequences of his-64p for ubiquitous and klp-19p (high), Y75B12B.1p (medium), and mesp-1p (low) for germline expression in an online application (https://www.wormbuilder.org/transgenebuilder/) (Vargas-Velazquez, El Mouridi, Alkhaldi, and Frøkjaer-Jensen, manuscript in preparation) for convenient transgene design and synthesis.
In summary, we characterized a set of germline-specific promoters that allow control over a range of expression levels in the germline. Our findings highlight the need for functional validation of RNA-seq and ATAC-seq data to annotate promoters. The newly characterized regulatory elements expand the growing C. elegans synthetic biology toolbox with short promoters that fine-tune germline expression and provide the means to regulate biological pathways with precision, improve genome editing specificity and efficiency, and mitigate potential transgene toxicity.

Strains
We maintained animals on Nematode Growth Media (NGM) plates seeded with either OP50 or HB101 Escherichia coli and cultured plates at either 20°C or 25°C (Brenner, 1974).

Molecular Biology
We designed all vectors in silico using the molecular biology editor ApE (Davis & Jorgensen, 2022). As criteria for selecting candidate germline promoters, we picked a mixture of trans-spliced and non-trans-spliced genes (Bernard et al., 2023) with high absolute and relative germline expression from Serizay et al. (2020). We further filtered candidates with promoter regions of 1 kb or less (defined from the start codon to the start or stop codon of the upstream protein-coding gene). Using single nucleotide substitutions, we modified homopolymers of 10 bp or more and BsaI restriction enzyme sites from endogenous sequences. We flanked all promoters with donor BsaI restriction sites and overhangs and synthesized them as gene fragments (Twist Bioscience, CA, USA). We synthesized the destination plasmid as a clonal vector (Twist Bioscience, CA, USA) and included the appropriate acceptor BsaI sites, a consensus start site (aaaa), a gfp with two nuclear localization signals (SV40 and egl-13), a tbb-2 3' UTR (Merritt et al., 2008), and a universal MosTI backbone (pSEM246) that contains a non-rescuing cbr-unc-119 fragment and homology arms targeting a MosTI safe-harbor landing site on chromosome II (El Mouridi et al., 2022). The gfp was codon optimized as in Fielmich et al. (2018) and designed following guidelines in Aljohani et al. (2020) but without PATC-rich introns. We generated repair templates using NEBridge® BsaI-HFv2 Golden Gate Assembly (New England Biolabs Cat. # E1601S) (Engler et al., 2009). Final expression vectors contained an identical 15 bp stretch, which include a partial attB1 site and the consensus start site, between the endogenous promoter sequence and the start codon of gfp. All vectors were verified using restriction digestion and Sanger sequencing.

Software
An online application for designing transgenes is available at www.wormbuilder.org/transgenebuilder. A manuscript detailing the application is in preparation by the authors (Vargas-Velazquez, El Mouridi, Alkhaldi, and Frøkjaer-Jensen). The version described in this manuscript has been archived in Caltech Data -10.22002/qs7eh-g0669.

Fluorescence Quantification
We selected one independent insertion line for each candidate promoter for fluorescence quantification. We noted early germline GFP expression using the puf-11 promoter that was rapidly silenced. We, therefore, grew the strains for multiple generations to ensure stable transgene expression. In order to minimize bias during quantification, we blinded N2 and transgenic insertion lines to their genotype. We then synchronized them by egg prep in a drop of bleaching solution (Stiernagle, 2006). The following day, we moved five animals to 10 fresh OP50 plates per strain and cultured them at 25°C for five days. Visual Quantification: We scored three plates per strain with a mixed-stage population by eye on a Kramer Scientific FBS10 LX microscope equipped with objectives 2x (Plan APO Objective 0.055NA), 10x (Plan APO Objective 0.3NA), and 20x (Plan APO Objective 0.42NA) and an X-Cite 200DC illuminator using the following scale: 0 (no detectable expression at 20x), 1 (detectable at 10x with zoom), 2 (detectable at 10x without zoom), 3 (detectable at 2x with zoom), and 4 (detectable at 2x without zoom). Flow Cytometry: We washed seven plates per strain with M9 to remove particles and bacteria twice. We measured the fluorescence of a mixed-stage population using a COPAS FP-250 μm large-particle flow cytometer (Union Biometrica) equipped with 488 nm and 561 nm excitation lasers. To select adult worms, we filtered raw data by TOF (time of flight) between 1500 and 1800 and peak-height extinction below 35,000 using Microsoft Excel for Mac (v16.70). We subtracted the mean peak-height GFP measurement from non-transgenic N2 animals from all reported values. We generated plots using GraphPad Prism 9 for macOS (v9.5.1).

Imaging
We immobilized animals using a 50 mM sodium-azide in M9 solution and mounted them on 2% agarose pads. We took images on an upright, non-motorized, compound microscope (Leica DM2500 with a Leica DFC7000 GT camera and Leica SFL4000 LED light source) with a 20x air objective. We maintained constant exposure time, gain, and binning in all images. Table 3 | Plasmids generated in this study (sequences available on request).